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CONSISTENCIES AND RATES OF CONVERGENCE OF 
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We study the asymptotics for jump-penalized least squares re- 
gression aiming at approximating a regression function by piecewise 
constant functions. Besides conventional consistency and convergence 
rates of the estimates in L 2 ([0, 1)) our results cover other metrics like 
Skorokhod metric on the space of cadlag functions and uniform met- 
rics on C([0, 1]). We will show that these estimators are in an adap- 
tive sense rate optimal over certain classes of "approximation spaces." 
Special cases are the class of functions of bounded variation (piece- 
wise) Holder continuous functions of order < a < 1 and the class 
of step functions with a finite but arbitrary number of jumps. In the 
latter setting, we will also deduce the rates known from change-point 
analysis for detecting the jumps. Finally, the issue of fully automatic 
selection of the smoothing parameter is addressed. 

1. Introduction. We consider regression models of the form 
(1) Yr=7? + C, i = l,...,n, 
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where (£")neN,l<i<n is a triangular scheme of independent zero-mean sub- 
Gaussian random variables and f i is the mean value of a square inte- 
grable function / G L 2 ([0,1)) over an appropriate interval [x^^a;™] [see, 
e.g., Donoho (1997)] 



For ease of notation, we will mostly suppress the dependency on n in the 
sequel. 

When trying to recover the characteristics of the regression function in ap- 
plications, we frequently face situations where the most striking features are 
sharp transitions, called change points, edges or jumps [for data 
examples see Fredkin and Rice (1992), Christensen and Rudemo (1996), 
Braun, Braun and Miiller (2000)]. To capture these features, in this paper 
we study a reconstruction of the original signal by step functions, which 
results from a least squares approximation of Y = (Y]_, . . . } Y n ) penalized 
by the number of jumps. More precisely, we consider minimizers T~(Y) 6 
argmin.ff 7 (-, Y) of the Potts functional 



Here J(u) = <i <n — 1,U{ ^ Uj+i} is the set of jumps of u € M n . Note 
that the minimizer is not necessarily unique. 

The name Potts functional refers to a model which is well known in sta- 
tistical mechanics and was introduced by Potts (1952) as a generalization 
of the Ising model [Ising (1925)] for a binary spin system to more than two 
states. The original model was considered in the context of Gibbs fields with 
energy equal to the above penalty. 

Various other strategies dealing with discontinuities are known in the lit- 
erature. Kernel regression as (linear) nonparametric method offers various 
ways to identify jumps in the regression function, essentially by estimat- 
ing modes of the derivative; see, for example, Hall and Titterington (1992), 
Loader (1996), Miiller (1992) or Miiller and Stadtmiiller (1999). Other ap- 
proaches like local M-smoothers [Chu et al. (1998)], sigma-filter 
[Godtliebsen, Spjotvoll and Marron (1997)], chains of sigma-filters 
[Aurich and Weule (1995)] or adaptive weights smoothing [Spokoiny (1998), 
Polzehl and Spokoiny (2003)] are based on nonlinear averages which mimic 
robust VK-estimators [cf. Hampel et al. (1986)] near discontinuities. There- 
fore, they do not blur the jump as much as linear methods would do. 

The case when the regression function is a step function has been studied 
first by Hinkley (1970) and later by Yao (1988) and Yao and Au (1989). 
Given a known upper bound for the number of jumps, Yao and Au (1989) 




(3) 



1 

H y (u, Y) = — Y,{ui - Yif + 7 • #J(u). 



i=l 
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derive the optimal 0{n~ l l 2 ) and 0(n _1 ) rates for recovering the function in 
an L 2 sense and detecting the jump points, respectively. Their results have 
been generalized to overdispersion models and applied to DNA-segmentation 
by Braun, Braun and Miiller (2000). Without the constraint of a known 
upper bound for the number of jumps, Birge and Massart (2007) give a 
nonasymptotic bound for the MSE for a slightly different penalty. 

In this more general setting we will deduce the same (parametric) rates 
as Yao and Au (1989) for the Potts minimizer if / is piecewise constant 
with a finite but arbitrary number of jumps. We show that the estimate 
asymptotically reconstructs the correct number of jumps with probability 
1. Further we will give (optimal) rates in the Skorokhod topology, which 
provides simultaneous convergence of the jump points and the graph of the 
function, respectively. As far as we know, this approach is new to regression 
analysis. 

If the true regression function is not a step function, the Potts mini- 
mizer cannot compete in terms of rate of convergence for smoothness as- 
sumptions stronger than C . This is due to the nonsmooth approach of 
approximation via step functions and could be improved by fitting polyno- 
mials between estimated jumps [see Spokoiny (1998), Kohler (1999)]. For 
less smooth functions, however, we will show that it is adaptive and obtains 
optimal rates of convergence. To this end, we prove rates of convergence 
in certain classes of "approximation spaces" well known in approximation 
theory [DeVore and Lorentz (1993)]. To our knowledge, these spaces have 
not been introduced to statistics before. As special cases, we obtain (up to 
a logarithmic factor) the optimal 0(n~ 1 ^) and 0(n~ a ^ 2a+1 ^) rates if / is 
of bounded variation or if / is (piecewise) Holder continuous on [0, 1] of 
order 1 > a > 0, respectively. The logarithmic factor occurs, since we give 
almost sure bounds instead of the more commonly used stochastic or mean 
square error bounds. Optimality in the class of functions with bounded vari- 
ation shows that the Potts minimizer has the attribute of "local adaptivity" 
[Donoho et al. (1995)]. Under the assumption that the error is bounded, 
Kohler (1999) obtained nearly the same rates (worse by an additional loga- 
rithmic term) in these Holder classes for the mean square error of a similar 
estimator. 

We stress that minimizing in (3) results in a step function, that is, a 
regressogram in the sense of Tukey (1961). Hence, this paper also answers the 
question how to choose the partition of the regressogram in an asymptotic 
optimal way [cf. Eubank (1999)] over a large scale of approximation spaces. 

Subset selection and TV penalization. Our results can be viewed as a 
result on subset selection in a linear model Y = a + (3 T X + e with covariates 
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X. In this context our estimator minimizes the functional 
n I k \ 2 

L n (a,P):=^2 [Yi-a-^PjXiA subject to #{j : ± 0} < N, 

i=i V j=i J 

or (for proper N), what is equivalent for a proper choice of 7, minimization 
of 

L n (a,/?)+ 7 #{j:/3^0}. 

Setting k = n — l as well as = 1 for j < i and else, we obtain the Potts 
functional (3) with u\= a and u; = a + 2~Z}=i Pj f° r 2 < -i < n. In general, 
to select the correct variables, one requires a kind of oversmoothing, which 
is reflected by our results in the present paper. The Potts smoother in (3) 
achieves this by means of an £q penalty and for nearly uncorrelated predic- 
tors it is well known that £\ penalization has almost the same properties as 
complexity-penalized least squares regression [cf. Donoho (2006a, 2006b)]. 
However, as a variable selection problem, detection of jumps in regression 
has a special feature, namely, the covariates Xij are highly correlated and 
these results do not apply. A similar comment applies to TV penalized es- 
timation, as, for example, considered by Mammen and van de Geer (1997) 
which aims for minimizing 

n 

F 7 (u,y)=7- \ui- u i+ i\+^2(ui-Yi) 2 . 

l<i<n-l i=l 

This can also be viewed in this context. Choosing X^ as above, it is a 
special case of the lasso, which was introduced by Tibshirani (1996) and 
minimizes L n {a,j3) subject to X^=ilA/l ^ Again, for (nearly) uncorre- 
lated predictors, the lasso comes close to the £0 solution. Thus, the relation 
of the Potts functional to the total variation penalty is roughly the same 
as the relation of subset selection to the lasso. In fact, for highly corre- 
lated predictors, the relationship between £q and £\ solutions is much less 
understood and this question is above the scope of the paper. However, it 
seems that in our case £\ penalization performs suboptimally. As an indi- 
cation, from Mammen and van de Geer (1997), Theorem 10, we obtain an 
upper rate bound of Op(n _Q//3 ) for the error of the total variation penalized 
least squares estimator of an a-H61der continuous function in contrast to 
the (optimal) rate of Op(n a /( 2a+1 )), achieved by the Potts minimizer. 

A reason for this difference is that the Potts functional will generally lead 
to fewer but higher jumps in the reconstruction, and hence is even more 
sparse than £\ or TV based reconstructions. In general, a side phenomenon 
related to such sparsity of an estimator is a bad uniform risk behavior [see 
Potscher and Leeb (2008)]. Although the conditions of that paper are not 
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fulfilled in our model (basically, contiguity of the error distributions will fail) , 
this phenomenon can be observed numerically in our situation. Our estimate 
will fail when the number of jumps grows too fast with the number of obser- 
vations and small plateaus in the data will not be captured. However, our 
emphasis is on estimation of the main data features (here jumps) to obtain 
a sparse description of data, similar in spirit to Davies and Kovac (2001). 

Computational issues. In general, a major burden of £q penalization is 
that it leads to optimization problems which are often NP hard and re- 
laxation of this functional becomes necessary or other penalties, such as 
£\, have to be used. Interestingly, computation of the minimizer of the 
Potts functional in (3) is a notable exception. The family (T 7 (Y))) 7> o can 
be computed in 0(n 3 ) and the minimizer for one 7 in 0(n 2 ) steps [see 
Winkler and Liebscher (2002)]. At the heart of that result is the observa- 
tion that the set of partitions of a discrete interval carries the structure of 
a directed acyclic graph which makes dynamic programming directly appli- 
cable [see Friedrich et al. (2008)]. 

The paper is organized as follows: after introducing some notation in 
Section 2, we provide in Section 3.1 the rates and consistency results for step 
functions and general bounded functions in the L 2 metric. In Section 3.2 we 
present the results of convergence in Hausdorff metric for the set of jump 
functions and in Section 3.3 for the Skorokhod topology for the regression 
function. In Section 3.4 we will introduce a simple data-driven parameter 
selection strategy resulting from our previous results and compare this to a 
multiresolution approach as in Davies and Kovac (2001). We briefly discuss 
relations to other models such as Bayesian imaging and extensions to higher 
dimensions in Section 4. Technical proofs are given in the Appendix. 

This paper is complemented by the work of Boysen et al. (2007) which 
contains technical details of some of the proofs, the consistency of the esti- 
mates for more general noise conditions and the consistency of the empirical 
scale space (T 7 (y)) 7> o toward its deterministic target [cf. Chaudhuri and Marron 
(2000)]. 

2. Model and notation. For a functional F : Q — > RU {00}, we denote by 
argminF the subset of O consisting of all minimizers of F. Let S([0, 1)) = 
{/:/ = E7=i a i 1 [t i ,t i+1 ),ai €R,0 = ti < ■•■ < t n+1 = l,n G N} denote the 
space of right-continuous step functions and let D([0,1)) denote the cadlag 
space of right-continuous functions on [0, 1] with left limits and left-continuous 
at 1. Both will be considered as subspaces of L 2 ([0, 1)) with the obvious iden- 
tification of a function with its equivalence class, which is injective for these 
two spaces. || • || will denote the norm of L 2 ([0, 1)) and the norm on L°°([0, 1)) 
is denoted by || • ||oo. 
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Minimizers of the Potts functionals (3) will be embedded into L 2 ([0,1)) 
by the map i n : R n i — ► L 2 ([0, 1)), 

n 

(4) L n ((u U ...,U n )) =]T] tt * 1 [(»-l)/'Wn)- 

i=l 

Under the regression model (1), this leads to estimates /„ = L n (T ln (Y)), that 
is, 

(5) / n G. n (argmintf 7n (-,Y)). 

Here and in the following (7 n ) n gN is a (possibly random) sequence of smooth- 
ing parameters. We suppress the dependence of /„ on 7 n since this choice 
will be clear from the context. 

For the noise, we assume the following uniform sub-Gaussian condition. 
For a discussion on how this condition can be weakened [see Boysen et al. 
(2007)]. 

Condition (A). The triangular array (4")neN,i<j<n of random vari- 
ables obeys the following properties. 

(i) For all n G N the random variables (£")i<i< n are independent. 

(ii) There is a universal constant f3 G M such that Ee"^" < e? v for all 
1 < i < n, and n G N. 

Finally, we recall the definition of Holder classes. We say that a function 
/ : [0, 1] — > M belongs to the Holder class of order < a < 1, if there exists 
C > such that 

|/(x) - f(y)\ < C\x - y\ a for all x,y G [0, 1]. 



3. Consistency and rates. In order to extend the Potts functional in (3) 
to L 2 ([0, 1)), we define for 7 > 0, the continuous Potts functionals : L 2 ([0, 1)) x 
L 2 ([0,1)) ^MU{oo}: 

H °°(g /) = ( 7 • #J(5) + U-f " 5 " 2 ' if 5 G 5([0 ' 
7 ' 1 00, otherwise. 

Here J(g) = {t G (0, 1) : / #(*+)} is the set of jumps of g G 5([0, 1)). By 

definition, we have for every g G argminff^°(-, /) that H^°(g, f) < iJ5°(0, /) = 
||/|| 2 and therefore #</(<?) < 7~ 1 ||/|| 2 for 7 > 0. Since a minimizer is uniquely 
determined by its set of jumps, minimizing can be reduced to a mini- 
mization problem on the compact set of jump configurations with not more 
than 7 _1 ||/|| 2 jumps which implies existence of a minimizer. For 7 = 0, we 
set H§?(g, f) = ||/ - g\\ 2 for all g G L 2 ([0, 1)), hence 
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Lemma 1. For any f £ L 2 ([0,1)) and all 7 > we have 
argmini^°(-, /) / 0. 

In order to keep the presentation simple, we choose throughout the follow- 
ing an equidistant design xf = i/n in the model (1) and (2). All results given 
remain valid for designs with design density h, such that inf^^i] h(t) > 
and h is Holder continuous on [0, 1] of order a > 1/2. Moreover, for all the- 
orems in this section we will assume that Y n is determined through (1) and 
the noise £ n satisfies Condition (A). 

3.1. Convergence in L 2 . We investigate the asymptotic behavior of the 
Potts minimizer when the sequence (7 n )neN converges to a constant 7 for 
7 > and 7 = 0, respectively. If 7 > 0, we do not recover the original function 
in the limit, but a parsimonious representation at a certain scale of interest 
determined by 7. For 7 = the Potts minimizer is consistent for the true 
signal under some conditions on the sequence (7 n )neN : 

(HI) (7 n ) ng N satisfies j n — ► and 7„n/logn— > 00 P-a.s. 

For the consistency in approximation spaces in Theorem 2, we consider 
instead 

(H2) (7 n ) n gN satisfies j n — > and 7„ > (1 + 5)12(3 logn/n P-a.s. for almost 
every n and some 5 > 0. Here (3 is given by the noise Condition (A). 

Theorem 1. (i) Assume that f G L 2 ([0, 1)) and 7 > are such that / 7 is 
a unique minimizer of /). Moreover, suppose ( r y n )neN satisfies 7 n — > 7 

P-a.s.; then 



([0,1)) 



n 



7 



P-a.s. 



n 



00 



(ii) Let feL 2 



([0,1)) and ( 7 n)neN fulfill (HI). Then 



([0,1)) 



P-a.s. 



00 



(iii) Lei / G 5([0, 1)) and (7 n )n G N fulfill (HI). TTien 




P-a.s. 



Moreover, 
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We stress that the parametric rates in Theorem l(iii) are obtained for a 
broad range of rates for the sequence of smoothing parameters. It is only 
required that 7 n converges to zero slower than log n/n. When trying to 
extend these results to more general function spaces, the question arises, 
which properties of the true regression function / determine the almost sure 
rate of convergence of the Potts estimator. It turns out that the answer lies 
in the speed of approximation of / by step functions. Let us introduce the 
approximation error 

(6) A k (f) := mfdl,? - /II : 9 € S([0, l)),#J(g) < k} 

and the corresponding approximation spaces 

A a = \f& L°°[0, 1] : sup k a A k (f) < oo\ 
I fc>i J 

for q > 0. The following theorem gives the almost sure rates of convergence 
for these spaces. 

Theorem 2. If f £ A a and (7 n )ngN satisfies condition (H2) ; then 

ll/n-/ll=0( 7 : /{2a+1) ) P-O.5. 

Now we give examples of well known function spaces contained in A a for 
a< 1. 

Example 1. Suppose / has finite total variation. Then, / G A 1 holds. 
Choosing 7 n x log n/n such that condition (H2) is fulfilled yields ||/ n — /|| = 
O^logn/n) 1 / 3 ) P-a.s. 

Proof. For the application of Theorem 2 we need to show that there is a 
5 > such that for all k G N, k > 1, there is an f k G 5([0, 1)) with ||/ - / fe || < 
5/{k + 1) and #J(fk) < k. Since each function of finite total variation is the 
difference of two increasing functions and #J(g + g') < #J(g) + #J(g'), it 
is enough to consider increasing / with /(0) = and /(l) < 1. Define for 
i = 1, . . . , k intervals 

i i = r i ([(i-i)/k,i/k)). 

Then, f k (x) = Eii 1^(^)^-1/2)/^ satisfies ||/-/ fc || < ||/-/ fc ||oc < (2k)- 1 
which completes the proof. □ 

Example 2. Suppose / belongs to a Holder class of order a (with < 
a < 1). Then, / G A a holds. For 7 n x logn/n fulfilling condition (H2), we 
get that ||/ n - /|| = 0((logn/n) a /( 2a+1 )) P-a.s. 
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Proof. Analogous to the proof above, we define for 7j = [(i — l)/k,i/k) 
the function f k (x) = J2i=i 1 h(x)f((i ~ l/2)/ife). On U we have \\f{x) - 
/(y)||oo < Ck~ a . Thus 11/ - M| < ||/ - M|oo < C(2k)~ a holds. □ 

Obviously this result still holds, if the regression function / is piecewise 
Holder with finitely many jumps. 

Remark 1 (The case a > 1). The characterization of the sets A a and 
related questions are a prominent theme in nonlinear approximation theory 
[see, e.g., DeVore (1998), DeVore and Lorentz (1993)]. For / piecewise C , it 
is known that a > 1 implies that / is piecewise constant [Burchard and Hale 
(1975)], whereas this is still an open problem for general /. We conjecture 
that this implication holds for any /. This would imply that stronger smooth- 
ness assumptions than in the examples above do not yield better convergence 
rates. 



Choosing 7„ independently of the function and the function class as in the 
examples above yields convergence rates which are up to a logarithmic factor 
the optimal rates in the classes A a , < a < 1 and S([0, 1)). This shows that 
the estimate is adaptive over these classes. The additional logarithmic factor 
originates from giving almost sure rates of convergence. 

3.2. Hausdorff convergence of the jump-sets. In this section we present 
the rates known from change-point analysis for detecting the locations of 
jumps if / is a step function. Moreover, the following theorem shows that 
we will eventually estimate the right number of jumps almost surely. Before 
stating the results, we recall the definition of the Hausdorff metric pu on 
the space of closed subsets contained in (0,1). For nonempty closed sets 
A, Be (0,1) set 

Ph (A, B) = max< maxmin 16 — a\, maxmin \b — a\ > 

[ a£A feG-B b&B a£A J 

and p H (A,0)=p H (0,A) = l. 

Theorem 3. Let f e 5([0, 1)) and (7„)„ e N fulfill (HI). Then: 

(i) #J{fn) = #J(f) f or large enough n F-a.s., 

(ii) p H (J(f n ),J(f))=0(logn/n) F-a.s., 

(iii) p H (J(f n ),J(f)) = O w (l/n). 

Remark 2 (Distribution of the jump locations and estimated function 
values). With the help of Theorem 3(i) we can derive the asymptotic dis- 
tribution of the jump locations and of the estimated function values be- 
tween, obtaining the same results as Yao and Au (1989), who assumed an a 
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priori bound of the number of jumps. To this end, note that the estimator 
of Yao and Au (1989) and the Potts minimizer coincide if they have the 
same number of jumps. Denoting the ordered jumps of / and their esti- 
mators by (n, . . . ,tr) and (fi, . . . , f^), respectively, we know by Theorem 

3(i) that asymptotically R = R holds almost surely. For R = R we get that 
n(fi, . . . ,tr) are asymptotically independent and the limit distribution of 
n(f r — [r r ]) is the minimum of a two-sided asymmetric random walk [cf. 
Yao and Au (1989), Theorem 1]. Moreover, the estimated function values 
are asymptotically normal with the parametric -y/n-rate. 

3.3. Convergence in Skorokhod topology. Now that we have established 
rates of convergence for the graph of the function as well as for the set of 
jump points, it is natural to ask whether one can handle both simultaneously. 
To this end, we recall the definition of the Skorokhod metric [Billingsley 
(1968), Chapter 3]. Let Ai denote the set of all strictly increasing continuous 
functions A : [0, 1] i — ► [0, 1] which are onto. We define for /, g S D([0, 1)) 

p s (f,g) = inf jmax(L(A) sup |/(A(t)) - g(t)\\ : A G Ai|, 

where L(A) = sup s _^ >0 | log t Zg^ 1 • The topology induced by this metric 
is called Ji -topology. 

We find that in the situation of Theorem 1 (i) we can establish consistency 
without further assumptions, whereas in the situation of Theorem l(ii), / 
has to belong to D([0, 1)). 

Theorem 4. (i) Under the assumptions of Theorem l(i) we have 

. o([o,i)) 

In >/ 7 P-a.s. 



(ii) // / £ D([0, 1)) and (7 n )neN satisfies condition (HI), then 
If f is continuous on [0, 1] 



- D{[0,1)) 

fn ►/ P-a.5. 



. i°°([o,i]) 

fn ► / P-C5. 

n— >oo 

(iii) If f £ S([0, 1)) and (7 n )neN satisfies condition (HI), then 
Ps(fnJ)=0( X [^) F-a.s. 



Moreover, 
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3.4. Parameter choice and simulated data. In this section we assume 
£™ ~ iV(0,(T 2 ),i = 1, ...,n i.i.d. for all n. Note that in this case we have 
(3 = a 2 /2 in Condition (A). Theorem 2 directly yields a simple data-driven 
procedure for choosing the parameter 7 which leads to optimal rates of 
convergence. For a strongly consistent estimate a of er, the choice ^y n = 
Ca 2 \ogn/n almost surely satisfies condition (H2) for C > 6 and gives the 
rates of Theorem 2. However, in simulations it turns out that smaller choices 
of C lead to better reconstructions. A closer look at the proof of Theorem 2 
shows that the constant in condition (H2) mainly depends on the behavior of 

the maximum of the partial sum process sup 1< j <J<ri (^" H h £™) 2 / (j — i + 

1). As we consider a triangular scheme instead of a sequence of i.i.d. random 
variables for the error we cannot use results as in Shao (1995) to obtain an 
almost sure bound for this process [cf. Tomkins (1974)]. But those results 
give an upper bound in probability (cf. Lemma A. 2) for the maximum. This 
allows us to refine the bound above to C > 2 + 5 for any 5 > and obtain 
the rates of Theorem 6 in probability. We found that values of C between 2 
and 3 lead to good reconstruction for various simulation settings. 

Figure 1 shows the behavior of the Potts minimizer for the test signals 
of Donoho and Johnstone (1994) sampled at 2048 points and a choice of 
C = 2.5. In order to understand the finite sample behavior of the Potts 
minimizer, the estimates are calculated at different signal-to- noise ratios 
||/|| 2 /cj 2 (seven, four and one). The reconstructions of the locally constant 
blocks signal (first row) differ very little from the original signal. This is 
not surprising since the original signal is in S([0, 1)) where the estimator 
achieves parametric rates. The spikes of the bumps signal (second row) are 
correctly estimated for all cases. The estimator captures all relevant features 
of the Heavisine signal (third row) at the levels seven and four. Only in the 
presence of strong noise the detail of the spike right to the second maximum 
is lost. Finally, the case of the Doppler signal (fourth row) shows that the 
estimator adapts well to locally changing smoothness. 

Clearly the performance depends on the particular function /. Hence one 
might want to try different approaches to selecting the parameter. One possi- 
bility is to choose the smoothing parameter according to the multiresolution 
criterion of Davies and Kovac (2001). If / £ 5([0,1)), this criterion picks 
asymptotically the correct number of jumps. 

Theorem 5. Assume f G 5([0, 1)), £f ~ N(0, a 2 ) i.i.d. and % is chosen 
according to the MR-criterion, that is, jn is the maximal value such that the 
corresponding reconstruction f^ IR satisfies 



(7) 4 



< (l + 6)ay/2logn 



Inn R 












J1LUL 




ilii 




JJJlJL jm 


JjLjijj t 
























ilk 



Fig. 1. TTie ie/t column shows signals from Donoho and Johnstone (1994). Columns 2, 4 an d 6 show noisy versions with signal-to-noise 
ratios of 7, 4 and 1, respectively. On the right of each noisy signal is the Potts reconstruction. The penalty was chosen as 7™ = 2.5a 2 log n/n, 
where a is an estimate of the variance. 
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for all connected I C {1, . . . ,n}, some 5 > and some consistent estimate 
a of a. Moreover, assume ^ n satisfies condition (HI) and f n is the corre- 



Note that it is possible to derive the same result if in (7) only dyadic 
intervals [see Davies and Kovac (2001)] are considered. We conjecture that 
the MR-criterion leads to consistent estimates in more general settings. 

4. Discussion — relation to other models. The Potts smoother falls in 
the general framework of van de Geer (2001) which gives very general and 
powerful tools to prove rates of convergence for penalized least squares esti- 
mates. With some effort, it is possible to use the methods developed in that 
paper to derive the convergence rates given in Theorem 2. However, using 
that method does not lead to the required constant in Section 3.4. In fact, 
the resulting constant in condition (H2) would be substantially larger. 

Most penalized least squares methods either use a penalty which is a 
seminorm (as in spline regression) or penalizes the number or size of coeffi- 
cients of an orthonormal basis reconstruction. Note that the Potts smoother 
belongs to none of these classes. Nonetheless, it is related to various other 
statistical procedures and we would like to close this paper by highlighting 
these relations and shortly comment on possible extensions to two dimen- 
sions. 

Bayesian interpretation and imaging. In image analysis Bayesian meth- 
ods for restoration have received much attention [see, e.g., Geman and Geman 
(1984)]. The Potts functional can be interpreted as a limit of the one- 
dimensional version of a certain MAP estimator, which has been used for 
edge- preserving smoothing, discussed by Blake and Zisserman (1987) and 
Kiinsch (1994) among many others. For a detailed discussion and overview 
of related functionals in dimension 1 [see Winkler et al. (2005)]. 

Generalization to 2d. For two-dimensional data, a measure of complex- 
ity corresponding to the number of jumps is given by the number of plateaus 
or partition elements. However, it is computationally infeasible to allow for 
arbitrary partitions in the reconstruction. Therefore one chooses a subclass 
of step functions with good approximation properties and seeks for effec- 
tive minimization algorithms in this class. As in the one-dimensional case, 
the rate of convergence will be determined by the approximation proper- 
ties of the chosen function class. One example, complexity penalized sums 
of squares with respect to a class of "Wedgelets" [cf. Donoho (1999)], is 
discussed in the Ph.D. thesis of Friedrich (2005), and possible alternatives 
in the survey by Fiihr, Demaret and Friedrich (2006). We mention that the 
proof of Theorem 2 could be adapted to their setting. 




14 



L. BOYSEN ET AL. 



APPENDIX: PROOFS 

A.l. Preliminaries. Since the consistency results are formulated in terms 
of a function space, we translate all minimization problems to equivalent 
problems for functionals on L 2 ([0,1)). Therefore we introduce the func- 
tionals H™(g,f)=H™(g,f) - \\f\\ 2 and H^(gJ) is defined as H™(g,f) 
for g £ S n ([0, 1)) := i n (M. n ), and oo, else. Clearly, the functionals are con- 
structed in such a way that the minimization of _ff 7 (3) on M n is equivalent 
to the minimization of H™ if we identify the minimizers via the map i n 
defined in (4). The constant — ||/|| 2 is just added for convenience and does 
not affect the minimization. Obviously, u £ argmin// 7 (-, / ) if and only if 
L n (u) £ argmin-ff"^, /) and similarly for H^(-,y) for y £ W 1 . The most im- 
portant property of these functionals is that the minimizers g £ S([0, 1)) of 
i/™ and for 7 > are determined by their jump-set J(g) and given 
by the projection onto the space of step functions which are constant out- 
side that set. To make this precise in the course of the proofs, we introduce 
for any Jc (0,1) the partition Pj = {[a, b) : a, b £ J U {0, 1}, (a, b) n J = 0}. 
Abbreviating by 

f , I (f)=e(iy l p(u)du 

the mean of / over some interval I, this projection is then given by 

fj= E 

iePj 

Further, we extend the noise in (1) to L 2 ([0, 1)) by £ n = t n ((^f, . . . , and, 
finally, we define for / £ S*([0,1)) the minimum distance between any two 
jumps as 

(8) mpl(/) := min{| S - t\ :s ^ t € J(f) U {0, 1}}. 

The proofs rely on properties of the noise, some a priori properties of the 
Potts minimizers and on proving epiconvergence of the functionals defined 
above with respect to the topology of L 2 ([0, 1)). 



A. 2. Two properties of the noise. The behavior of £™ = J2iePj l^i{^ n )^-i 
from Condition (A) is controlled by the following two estimates which are 
proved in Boysen et al. (2007), Section 4.2. 



Lemma A.l. Let (£f ) n eN,i<;<n fulfill Condition (A). For 

, Q x r (ff + - + ff) 2 

(9 C n := sup , 

l<i<j<n U - % + Ijlogn 
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we have that 

lim sup C n < 12(5 P-a.s. 

n— »oo 

Moreover, for all intervals I C [0, 1) and all n G N 

Mn 2 <cJ ogn 



l n£{I) 
as well as 

(io) iian 2 = E Ki)Me) 2 <c n l ^(#j n +i). 

Lemma A. 2. Assume ^ ^ N(0,a 2 ),i = I, ... ,n i.i.d. for all n. Then 
for C n defined by (9) we have C n = 2a 2 + op(l). 

A. 3. A priori properties of the minimizers. The following properties of 
the minimizers are used to prove our main statements. 

Lemma A. 3. Let f G L 2 ([0, 1)), g G argmini/™(-, /) and I G Pj( g ) ■ Then, 
denoting a = fii(g) = f I g{u) du , the following statements are valid. 

(i) // 1' G Pj{g) and I' U I is an interval, then 

jimp) 2 
7 - i(i)+W M/) ~ M/)) • 

(ii) If I' £ B n , I' C /, is an interval, then 

2 1 >i{l')^ v {f)-af. 

(hi) // both I' G B n and I' U I are intervals and li>g = 61/' for some 
b G R, then 



(6-a)(M/)- £ y^)>0. 



(iv) If I[,l2,I[ U 1, 1*2 U / G -B n are intervals and Iff = hlj^, I = 1,2, 
then for all disjoint intervals I\, I2 G B n , I = I\ U I2, such that I\ U /{ and 
I2 U / 2 are intervals, 

e(h)(n h (f) - b 1 ) 2 + e(i 2 )( f i l2 (f) - b 2 ) 2 

> 7 + l{.h){ni x {f) - af + ^(/ 2 )( W2 (/) - a) 2 . 
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Proof. The inequalities are obtained by elementary calculations com- 
paring the values of H™(-,f) at g and at some g obtained from g by: joining 
the plateaus at / and V [for (i)], splitting the plateau at I into three plateaus 
[for (ii)], moving the jump point [for (hi)], and removing the plateau at / by 
joining each of the parts to the adjacent intervals [for (iv)]. 

As an example, we provide the calculations for (i). Determine t by {t} = 
I n F and set g = fj( g )\{t} ■ Then g differs from g only on / n V such that 

0<H%(gJ)-H™(gJ) 

= -l+WMf) ~ /W/))l/f + IKM/) " M/u/'(/))li'f 

= - 7 + £(i)(m(f) - /i/u/'(/)) 2 + - wuH/)) 2 

= - i+ mrim Mf) ~ Mf)) ' 

which completes the proof of (i). □ 

A. 4. Epiconvergence. One basic idea of the consistency proofs is to use 
the concept of epiconvergence of the functionals [see, e.g., Dal Maso (1993), 
Hess (1996)]. We say that numerical functions F n : O i— ► R U {00} , n = 1, . . . , 00 
on a metric space (0,/o) epiconverge to if for all sequences (i? n )neN with 
i) n -»i)6 0we have F^id) < liminfn^oo F n ({f n ), and for all *d G there ex- 
ists a sequence ($ n )neN with i? n — > $ such that Foo(i?) > limsup^^^ F n (^ n ). 
One important property is that each accumulation point of a sequence of 
minimizers of F n is a minimizer of F^. However, that does not mean that 
a sequence of minimizers has accumulation points at all. To prove this, one 
needs to show that the minimizers are contained in a compact set. The 
following lemma which is a straightforward consequence of the characteriza- 
tion of compact subsets of -D([0,1)) [Billingsley (1968), Theorem 14.3] will 
be applied to this end. 

Lemma A. 4. A subset A C D([0, 1)) is relatively compact if the following 
two conditions hold: 

(CI) For all t £ [0, 1] there is a compact set Kt C M such that 

g(t) £ K t for all g G A. 

(C2) For all e > there exists a 5 > such that for all g G A there is a 
step function g £ G 5*([0, 1)) such that 

sup{\g(t)-g £ (t)\:t£[0,l}}<e and mpl(# e ) > S, 

where mpl is defined by (8). 
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A.5. The proof of Theorem l(i), (ii) and Theorem 4(i), (ii). For the 

sake of brevity we just give a short outline of the proof of the first two parts 
of Theorem 1 and the proof of Theorem 4(i). The details can be found in 
Boysen et al. (2007). The proof of Theorem l(iii) is postponed to Section 
A. 7, because it requires the proof of Theorem 3. 

Proof of Theorem l(i), (ii). Note that condition (HI) automati- 
cally holds if 7 n — > 7 > 0. We can thus prove both parts at once: Use first 
H^(fn,f + Z n )<H^(0,f + Z n ), 7nn/log7woo and (10) to obtain 

2H/II + 2C^nM 
j n - 2C n (log n/n) 

Then (10) and 7 n n/logn — ► oo imply 

(12) neiii 2 = £ £(/)Mn 2 -o 



-a.s. 



The map 



#J(g), if S G5([0,1)), 
oo, if 5 £S([0,l)), 

is lower semicontinuous as map from I? to N U oo. Using that together 
with (11) and (12), we can verify the two inequalities from the definition of 
epiconvergence and deduce that (•, / + £ n ) actually converges to H^°(-, f) 
for 7„ — > 7 > and 7 n n/logn — ► oo in that sense. Since for any / G L 2 ([Q, 1)) 
the set {fj-JC (0, 1), # J < oo} is relatively compact in L 2 ([0, 1)), a com- 
parison of H™ n (f n J + C) with H™ n (0,f + £ n ) and usage of (11) above 
yields that the set lJ neN argmin^™ n (-, / + £ n ) is relatively compact. The 
uniqueness of the minimizer of H^(-,f) along with the epiconvergence of 

H™ n (-,f + £") and the compactness finally imply convergence of the mini- 
mizers. □ 

Proof of Theorem 4(i). To prove this, one can proceed in a similar 
way as above. The proof of Lemma 1 is straightforward using H??(0,f) = 
\\f\\ 2 and the relative compactness of {fj.#J < ||/|| 2 /7} in L 2 ([0, 1)) for 
7>0. □ 

Next, we will prove consistency in the space D([0,1)) equipped with the 
Skorokhod Ji-topology. This part is considerably more elaborate; in partic- 
ular we need some of the a priori information about the minimizers provided 
by Lemma A. 3. 



Proof of Theorem 4(ii). All equations in this proof hold P-almost 
surely, which will be omitted for ease of notation. If /i,/2 € D([0, 1)) are 
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limit points of the sequence of minimizers, we know by Theorem l(ii) that 
/ = f\ = f% in L 2 ([0, 1)), which implies that they are equal in D([0, 1)). Thus, 
it is enough to show that the minimizers {(/ + £ n )j n :n 6 N} are contained 
in a compact set. For this goal we use now the conditions (CI), (C2) from 
Lemma A. 4. 

For the proof of (CI), consider any interval / £ Pj n . We know from part 
(i) of Lemma A. 3, for any neighboring interval I', that 

t„ < t ff+q r) (M/ + D - Mf + Of 



£(I) + £{I')\ UJUQO n£{I) n£(F) 

< 1211/11^(1) + 6C n ^. 

n 

This yields 1/£(I) = <3(7~ 1 ). Application of Lemma A.l yields 
Wl = max{ W (D 2 : 1 € PjJ = O (^) = o(l) 

and ||(/ + £ n )j„l|oo = O(l). For the proof of (C2), let us fix e > and a step 
function / with ||/ — /||oo < s/7. Further, set 5 = mpl(/) > 0. Now we will 
consider three different classes of intervals / G Pj n which are characterized 
by their position relative to </(/) and estimate (/ + £ n ) j n — f uniformly on 
them, separately. 

Class 1 consists of intervals / with J(f) fl I = 0. We obtain that 

iii/(/-(/+njn)iioc<ii i K/-/jjiu+iiaiu<ii/-/iioc+o(i)< £ /7 

for large enough n uniformly for all such I and n. 

Class 2 covers intervals / which are not in class 1 but for which there is 
some interval I E w ith £(I > 5/6. To apply Lemma A. 3(h), choose 

an interval I' C In I from B n such that ph{J' , I H /) < 1/n. We hnd for all 
t€l' 



\(f+e)j n (t)-Mf+e)\<J^k< 



£{I>) ~ y 5/6 -2/n 

\(f+e)j n (t) - mi < \Mf) - + \Me)\ + 



hence 

' 2^7 



5/6 -2/n 



Cnhgn/n 2j n 
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for large enough n depending only on (-j n ) n ^,5,e. Clearly, this implies that 
for n large enough supj n/ / |(/ + £ n )j„ — f\ <e/6 uniformly in /, 

Class 3 contains all intervals I G Pj n which are in neither class 1 nor class 
2 such that £(I) < 5/3 and InJ( f) = {to}. Then the neighboring intervals of 
7 in Pj n belong necessarily to class 1 or 2. Further, if a neighboring interval 
7' is in class 2, we know that there is / G Pj(J\ with i(I fl 7') > 5/6 and 

I f] I ^ such that dist(to,-0 = 0. In any case, we find for any interval 7 
with endpoint to in Pjrft an d anv interval /' neighboring / in Pj n with 

7' n 7> that sup fnJ , 1(/ + - /I < e/6 and thus + DjJ - 

= lM/nr((/ + DjJ " Mf n /'(/)l < e / 6 - 
We choose ii with nti G N and \t\ — to\ < 1/n as well as I\ = I n [0, ii), 

J2 = JD [ti, 1) and as neighboring intervals of Ij in 7j n , j = 1,2. Denoting 

a = fij(f + £ n ) and bj = + £ n ), application of Lemma A.3(iv) yields 

(together with Lemma A.l) that 

£(h)(a - MJl (/ + C)) 2 + i(h)(a - M j a (/ + C)) 2 

< -7n + «(/i)(6i - fJ-ii (f + r )) 2 + i{h)(b2 ~ W 2 (/ + Of, 

< - 7 n + 2£(/ 1 ) / u /l (D(a - 61) + 2£(I 2 )fM h (e)(a - b 2 ) 
+ £{h){b x - Wl (/)) 2 + ^(7 2 )(6 2 - ^ 2 {f)f 

< 2^(70^ " h) + 2^(I 2 )/i /2 (C)(a - 62) + ^(/)e 2 (l/6 + 1/7) 2 

< 2\a - bi|^(/i)C n logn/n + 2|a - b2\y/£(h)C n \ogn/n + £(I)e 2 /9. 

From ||0 || = o(l) we find bi — a = 0(1) such that for large n depending on 
e, 5 only 

£(h)(a - fi h {f)f + £(I 2 )(a - /x /2 (/)) 2 < £(I)e 2 /9. 
The above results yield for t' G 7 that 

+ Dj„(0 - ma (/)) 2 + + D j„(0 - w 2 (/)) 2 < ^(/)e 2 /9 

and hence 

min(| (/ + C)j n (0 - Wi (/) I , I (/ + C) J„ (0 - W 2 (/) I) < e/3, 
min(| (/ + H J« (0 " Mix (/) I , I (/ + f) J„ (0 " M/ a (/) I) < e/2- 

This shows that either ||l/ n [t ,i)(/- (/ + £ n )jJlloo < e/2 or || l/ n [o,* )(/ ~ 
(/ + £, n )j„)\\oo < e/2 holds for large n, depending on s, 5 only. 
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Given J n we define a new partition P' n coarser than Pj n by the following 
procedure. First we join all neighboring intervals of class 1 and denote the 
resulting intervals again as class 1. If there are class 1 intervals left of length 
< 5/3, there must be a left or a right neighbor which is class 2 and has an 
overlap of length > 5/3 with an interval of constancy of /. Then we join the 
class 1 interval to that neighbor (if there are two, to the left one). At the 
end, we join each class 3 interval I to its left neighbor, if ||l/n[to,i)(/ — (/ + 
£ n )j„)l|oo < e/2, or else to its right neighbor. The collection of those joined 
intervals is P' n . 

By the results for class 1,2,3 intervals we know for all / G P' n that 1(1) > 
5/3. Further, for each I G P' n there is /' G Pjtf) sucn that for all I G 

P Jn , /C/, and ||W(/- (/ + e n )jn)IU <s/2 holds. Thus, defining f n = 
Ei e p k M/((/ + DjJ1/ we obtain that \\f n - + jJU < e. Thus (C2) is 
established and by Lemma A. 4 {(/ + £ n ) J„ '■ n £ N} is contained in a compact 
set. This completes the proof of the first assertion. The second assertion 
follows from the fact that convergence in D([0, 1)) implies convergence in 
Lqq ([0,1]) if the limit is continuous [Billingsley (1968), page 112]. □ 

A. 6. The proof of Theorem 2. Fix numbers k n > 1, the precise magni- 
tude of which will be chosen below. Further, sets K n C {1/n, . . . , (re — l)/n} 
are chosen such that fx„ is a best approximation of / by a step function from 
£ n ([0, 1)) with k n > 1 jumps, which exists since the subspace of 5 n ([0,l)) 
containing functions g with jf=J(g) < k n and ||g|| < 2||/|| is compact. 

Let fk a be an approximation of / in S([0, 1)) with at most k n jumps for 
which \\fk n — /|| = O(j^). Further, without loss of generality, we can as- 

sume that f kn = /j ( / fc ) which implies H/fcJIoc < ll/IU- Moving each jump 

of fk n to the next t G [0,1] with nt G N but leaving the value of fk n un- 
changed on each plateau, we obtain a step function /„ G 5 n ([0,l)) with 
Wfkn ~ /nil 2 < ^II/IIL- This shows ||/ n - /|| 2 = + ^). Since f Kn is 

a best approximation, we derive 

ll&--/ll 2 = o(^4). 

By definition /„ is a minimizer of H~ n (■, / + £ n ) and we get 

Hi(Lf+e)<Hi(fK n ,f+n- 

By #K n = k n , this implies 7n # J n + ||/ n - / - T|| 2 < lnK + \\fx n - /-C n || 2 
and hence 

||/n " /IP < ln{K ~ #Jn) + \\f Kn ~ ft + 2(f - f Kn ,0 + 2(/„ - f,C) 
< ln(k n - #J„) + \\f Kn ~ ft + 2(/ n " fl< n ,C)- 
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Now observe that J(f n — fK n ) Q K n which gives 

(fn ~ fK n ,C) = (fn ~ fK n ,(C)j n UK n ) 

< Win - fK n \\\\(£ n )j„uK n \\ 

< Wfn ~ /|| \\(C)j n UK n || + ||/ " /*„ || || (O J n UK n || 

<^n/n-/ii 2 + ^ii(rwji 2 

+ t||/ - /a-„|| 2 + 7ll(C n )j„u^„ll 2 - 



5" J J n " 4 
The above inequalities yield 

\\fn " /II 2 < ln{k n ~ #Jn) + ^||/*„ ~ ff + (1 + fl^W,. 



Z + 

Using the estimate (10) with C n from (9) we obtain for C' = 5/(2 + 5) 
C'Wfn-ff 

< ln(k n ~ # Jn) + C" + ^) + (l + <5)C n ^(#J n + fc n + 1 



n/ n 



< K L n + (1 + 5)C n ^ + —)+ #J n ((I + S)C n 1 ^ - y n ) 
\ n n J \ n ) 

C" M ~„ logn 

+ e + (1+ ' )Cn ^T' 

for some constant C" depending on /. We get from 7 n > (1 + <5)12/31ogn/n 
together with the relation \\msxvp n ^ >00 C n < 12/3 that (1 + 5)C n logn/n < 
7„ and C"/n < 7 n for large enough n, hence C"||/ n — /|| 2 < 7 n (3/c n + 1) + 
C"/kl a . Choosing k n = L7« 1/(2 " +1) J we obtain 

ll/n-/H 2 = 0(7 2a/(2Q+1) ) 

and the proof is complete. 

A. 7. The proof of Theorem 3, Theorem l(iii) and Theorem 4(iii). 

Proof of Theorem 3(ii). 1. First we will show that 
(13) Vt e J{f) 3t n E J n with \t n -t\ <mpl(/)/3. 

From part (i) of Theorem 4 and S([0, 1)) C -D([0, 1)) we obtain immediately 

. D([0,1)) 

that f n > f. Therefore, there is some random integer no such that for 
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all n>riQ 

, Ps(fnJ) 
(14) 

< mm(min{|/(i) - f(t - 0)| :t G J(/)}/2, |log(l - §mpl(/))|). 

The relation (13) is a direct consequence of inequality (14). Assume (13) 
does not hold. In this Lipschitz function A G Ai with L(A) < | log(l — 

2/3mpl(/))| could not achieve t G «/(/ n °A) and hence ||/ n °A — /||oo > \f(t) ~ 
/(t- 0)|/2 contradicting (14). 

2. Now we will show that for all t G «/(/) there exists a sequence t n G J n , 
such that |i n — t\ = 0(logn/n). For any i G J(f) let i n be a point in J„ 
closest to t. We want to apply Lemma A.3(iii). For that goal, suppose for 
the moment that t n < t and f(t)>f(t — 0). Choose I n G Pj n as interval 
with right end point t n and set I' n = [t n ,s n ) where ns n G N is such that 
|s„ — 1| < 1/n as well as a n = fj>i n (f n ) and b n = ^r n {f n )- Then Lemma A.3(iii) 
shows 

(&n-«n)(/i7;(/ + n-^ 
- D{[0,1)) 

Clearly, f n > f implies a n > f(t — 0) and b n > f(t) such that 

n— »oo n— »oo n— »oo 

almost surely eventually 



> 0. 



a n + 6 n ~ / logn 



2 - -nv>^- »y„^(j/) 

We know further lirm^oo jjlji (/) = f(t — 0) such that almost surely eventu- 
ally 



0> /(t-0)-/(^ /logn 



which implies -£(7^) = O (log n/n) and |t n — t| = 0(logn/n). 

3. Next we will prove that there exists no sequence t n G J n which satisfies 
the relation limsup n ^ 00 (n/ logn)/9#({i n }, J) = oo. We consider two adja- 
cent intervals I, I' G Pj n for which there is an 7 G Pj(f) with £{I U 7' \ 7) = 
0(logn/n). Then 

l^/n/)/,/^)^-^/)/^/^)^! 



M/) -^/ n /(/)l 



£(/y(/ n /) 

|£(J n 7) / A/ /(u) cfti - *(/ \ 7) / Jn/ /(«) du\ 



< 2 



e(i)£(mi) 
£(inl)£(i\i) oim , £{I\I) 



£(i)£{ini) " " *(J) 

and a similar estimate holds for 7'. By means of Hj^if) = ^I'niif) an( ^ 
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1/1(1) = 0(l/ 7n ) we obtain 



(m(f)-^(f)) 2 <(l/£(I) 2 + l/£(I') 2 )0 

<(i/e(i) + i/£(i'))o 

= (!/£(!) + l/£(l'))o 



log 2 n 



log 2 n 

l n n 2 



logn 



Now Lemma A.3(i) implies 

7n < w+^ w + n - w (/ + n) 2 

This contradicts 7 n n/logn — > oo. Thus, almost surely, there are only finitely 
many n for which there are two adjacent intervals 1,1' G Pj n and / G -fj(f) 
with £(I U I' \ I) = 0(\ogn/n). Consequently, pniJn, J if)) = 0(logn/n), 
which implies the statement. □ 

PROOF of Theorem 3(i). 4. Suppose now there are s n ,t n G J n with 
s n — > t, t n —>t for f G Then we have by the previous result that |t n — 

•Sn| = O (log n/n) as well as l/\t n — s n \ = 0(l/j n ). This gives us logn/ (n^ n ) = 
O(l) contradicting n7„/logn— > oo. Thus #J n = #«/(/) eventually. □ 

Proof of Theorem 3(iii). 5. For this statement, observe that in the 
special situation considered in step 2, it is not necessary to assume \s n — 1\ < 
1/n. Hence for any s n G [t n , t) with ns n G N we have almost surely eventually 

conditional on t n <t. Denote p the largest integer such that p/n <t — 1/n. 
Using the exponential inequality [cf. Petrov (1975), Sections 3 and 4] 

n \ , 9 



'( X>£? > zj < exp(^- i ^— 2 ) 



z 



for triangular arrays fulfilling Condition (A) and all numbers Ui, i = 1, . . . , n, 
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we obtain for all k' E N 
F({k' /n<{t-t n )<{k' + l)/n}) 

/(t)-/(t-o) 



< 



^[(p+i-fc')/n,(p+l-fc'+i)/™)(^ ?1 ) - 



3 

for all i = 1, . . . , k' 

g-, +1 + - + g-, + , > /W-/( t -Q) forall ^ 1 k , 

i 3 ' ' 

+ /(t)-/(t-o) 



< 



fc' 3 
k'z 2 \ ( f-z 2 ^ k ' 



exp — — — = exp 



4/3 J V V 4/3 

where z=(f(t)-f(t- 0))/3. Note that g < 1 depends on f(t) -f(t-0) and 
/3 only. Clearly, we can use a similar argument if /(£ — 0) > f(t) or t n >t. 
Summing up these inequalities we obtain P({|i — t n \ > k/n}) < 2q k /(l — q) 
and 

P({p^(J„, J(/)) > fc/n}) < 2#J(/)gV(l " ?)• 
This shows lim^oo limsupjj^^ P({/9j/(J n , J(/)) > k/n}) = 0, or in other 
words p H {J n ,J{f)) = P (n~ 1 ). □ 

Proof of Theorem l(iii), Theorem 4(iii). 6. By 4 and 5, we may 
choose n so large that #J n = #J(/) and p^(J n , J(/)) < mpl(/)/3. Then 
there is a unique 1-1 map c/? n : J(/) i — > J n for which YlteJ(f) I* ~~ ^nWI is 
minimal. We derive (p n (t) — t = O (log n/n) for all t G </(/)• Extend now 
by (/3 n (0) = and ^ n (l) = 1- For [s,t) G we get thus 



l 1 [v>««,v>n(*)) - ~° 



logn 



n 



Further, ||/||oo < oo yields \u[ Vn ( s ), Vn (t))(.f) ~ V[ s ,t)(f)\ = O{^j\ognjn). Lem- 
ma A.l imp lies that H[<p n ( 8 ),<p n {t)) (O = 0(y/logn/n) such that ||/„ - /|| = 
0(\/\ogn/n) which yields the first part of Theorem l(iii) and 

(/ log n 

We define an extension A n E Ai of <p n by linear interpolation. From above, 
we obtain the estimate ||/ n — / o A n ||oo = 0(\/logn/n). Furthermore, 

<Pn(t) ~ <Pn(s) 



~L(ip n ) = max 



log- 



t-S 



O (log n/n) 
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such that ps{fn, f) = 0<yiogra/n). 

7. By direct calculations we obtain from (15) and Lemma A.l that 

maxMm=Or(™~ 1/2 )- 

Using this estimate and pij{Jn, J{f)) = Op(^) in the same way as the almost 
sure rate in step 6, we obtain that ps(fn,f) and \\f n — f\\ are of order 
Op(l/y/n). □ 

A.8. The proof of Theorem 5. It is sufficient to show that 

n— >oo 

Assume there exists some subsequence such that #J{fnl R ) < #^(/) 
for all n/j. As a step function with #J(f) jumps cannot be approximated 
by a sequence of functions with fewer jumps, there exists a sequence of 
connected intervals I Uk with I nk G B n , k such that liminf^^oo l(I nk ) > e i > 
and for / nfe = : G J nfc } 



1 \^ 7™fc _ fMR(n k ' 

j Z_^i J i Jrik v i < 



Consequently by Lemma A.l for large 

IV - V"* 1 — ?MR( n k ^ 



> e 2 > 0. 



> e2\/ e i n fc ~ 0(\/logn fc ) P-a.s. 

This implies that for large the MR-criterion is not satisfied. By Theo- 
rem 3(i) we have P(#J(/„) = #J(/)) -> 1 for n -c cxd. Hence F(#J(f^ R ) > 
#■/(/»)) >1- 

It remains to show that f^f R has asymptotically at most as many jumps 
as f n . Observe that 

lEii*? - Uxn 

max 



l<j<k<n Jk - j + 1 
(16) fc n » 

. I Si=7 £i I + I Si=7 fn\ x \ ) ~ fi 

< max 



!<7<fc<n s/k - j + 1 
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By the Cauchy-Schwarz inequality and Theorem l(iii) we have for 1 < j < 
k < n 



Y,i=jfn(Zi) -7i \ _ n \{fn - /", l[j/n,(fc+l)/n))l 



y/k-j + 1 Vk-j+l 

<n^m^(\\fn-f\\ + \\T-f\\) 

= V^(\\fn-f\\ + \\f n -f\\) = Op(l) 

uniformly in j, k. Lemma A. 2 implies 



max 1 J = = ( j v /21ogn + opfy/logn). 
l<j<k<n s/k - j + 1 

Applying the results above to (16) we arrive at 

I J2i=j — fn(xf)\ r— r 

max J =a\/2\ogn + op{\/\ogn). 

i<i<fc<« v k — j + 1 

Since a is a consistent estimate of a, this implies that the probability that 
f n satisfies the MR-criterion tends to 1 as n goes to infinity. As 7 n is cho- 
sen maximal such that the MR-criterion is satisfied, we can conclude P(7 n > 
7„) > 1 and consequently F(#J(f^ IR ) < #J(/ n )) ► 1 which proves 

n— >oo n— >oo 

the claim. □ 
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